DPDM Experiential Learning : Data Visualization

Group 4

  • Aman Garg : 21030242007
  • Amandeep Sharma : 21030242008
  • Jaikishan Patel : 21030242070
  • Rut Patel : 21030242072

Dataset : US Police Shootings
Data Source : https://www.kaggle.com/ahsen1330/us-police-shootings

What is Data Visualization?

Applications of Data Visualization

  1. Healthcare industries: Data visualization will help to create a dashboard to visualize patients’ history.This can help doctor to analyse reports.

  2. Finance: Finance is one of the most number driven. Data visualization can help in customer analysis and risk assessment

  3. Manufacturing : We can use data visualization in analysing demand and supply of current stocks.

Advantages of Data visualizations

  1. Human mind can understand and learn from visuals much faster and hence data visualization makes it easy to understand underlying trends in the data.

  2. It can help understand certain situations quickly in order to take decisions faster.

  3. People do not need to have a statistical or data science background to understand the graphical representations.

Some of the tools available in the market for data visualiztion

Graphs used our case study

  1. Bar Graph : Bar graphs shows rectangular bars on an x-y plane that describe set of data points.

  2. Line Graph : Group of data points are connected by line to showcase a trend over a period of time.

  3. Pie Chart : A circular chart that divides it into proportions. Each sector can be represented as a percentage of a whole.

  4. Box plot : Showcases distribution of data. Boxplots can also give a sense of outliers in the data. it also shows the median, Quartile-1, Quartile-3 and Inter quartile range.

  5. Distplot : Showcases the distribution of a single variable against it's density.

  6. Heatmap : Shows magnitude of a phenomenon as color in two dimensions

Data Visualization using Python

Libraries used:

  1. Pandas
    • Provides data structures like Dataframe and Series.
    • Can read different file formats like excel, csv etc as dataframes.
    • Provides ability to process data easily.

  2. Numpy
    • Short for Numerical Python
    • Provides arrays and matrix data structures which are much faster than list data structure python provides.
    • Provides many mathematical functions for processing.

  3. Statsmodels
    • Provides users classes and functions that help users conduct statistical tests on any dataset.
    • Matplotlib is used for graphical representation for Statsmodels.

  4. Scipy
    • Used for scientific and technical computing.
    • Contains modules for optimization, linear alegbra, image processing.

  5. Matplotlib
    • Used for plotting 2D figures like graphs, histograms, scatter plots and more.
    • Uses Numpy and Pandas for plotting.

  6. Seaborn
    • It is based on Matplotlib but generates more attractive figures.
    • Provides easier customization.

  7. Plotly
    • It is another library for plotting figures.
    • Provides even more customization than Matlotlib and Seaborn : like hover tools.

Preliminary data analysis

1. Top 10 weapons victims were armed with

2. Count of different races of the victims

3. How the victims died?

4. Comparing victims by gender

5. Distribution of plice shootings by states

6. 20 Cities with most shootings

7. Distribution of Gender and Age

8. Frquency of arms category used

9. Mental illness check

10. Hypothesis

NULL: Attack threats is equal to 64% of all the threats Alternative:Attack threats is not equal to 64% of all the threats

To conclude p-value = 0.416366 > 0.05, we accept the null hypothesis and can say that attack threats are 64% of all the threats. For proof bar plot has been plotted below:

11. Number of men/women fleeing vs not fleeing

12. Body Camera off: All Races

13. Race vs Age Distribution

14. Arms category: Other than guns

15. Number of shootings in each state where victim was armed with a gun.

16. Relationship between mental illness and weapon use

17. Trend in number of shooting over the years

18. Finding Outliers by age

The above boxplot of age shows there are ouliers. Now, we will find outlier information.

People above 72 is considered outlier in box plot.

Since,The number of people is significant in number(37) whose age is cosidered outlier by box plot,we will consider them for analysis and will not delete them.